perm filename CHAP6[4,KMC]12 blob sn#060109 filedate 1973-08-23 generic text, type T, neo UTF8
00100	VALIDATION
00200	
00300	6.1 SOME TESTS
00400	
00500		The term "validate" derives from the Latin  VALIDUS=  strong.
00600	Thus  to  validate  X means to strengthen it.   In science it usually
00700	means to strengthen X's acceptability as a hypothesis,  theory  ,  or
00800	model.     To  validate is to carry out procedures which show to what
00900	degree X, or its consequences, correspond with facts of  observation.
01000	In the case of an interactive simulation model we can compare samples
01100	of the model's I-O pairs with samples of I-O pairs from  the  model`s
01200	subject.
01300		Since samples of I-O behavior from the model and its  subject
01400	are  being compared, one can always question whether the human sample
01500	is a "good" one, i.e.representative of the  process  being  modelled.
01600	Assuming  that it has been so judged, discrepancies in the comparison
01700	reveal what is not sufficiently understood and must  be  modified  in
01800	the model. After modifications are carried out, a fresh comparison is
01900	made and repeated cycles are made through this process in attempts to
02000	gain  convergence.    Such  a  validation  procedure  characterizes a
02100	progressive (in contrast to a stationary) research program.
02200		Once   a  simulation  model  reaches  a  stage  of  intuitive
02300	adequacy, its builder should consider using more stringent evaluation
02400	procedures  relevant  to  the  model's  purposes. For example, if the
02500	model is to serve as a as a training device, then a simple evaluation
02600	of  its  pedagogic effectiveness would be sufficient.    But when the
02700	model is proposed as an explantion of a  symbolic  process,  more  is
02800	demanded  of  the  evaluation  procedure.  In  the area of simulation
02900	models, Turing's test  has  often  been  suggested  as  a  validation
03000	procedure. (Abelson,1968).
03100		It  is  very easy to become confused about Turing's Test.  In
03200	part this is due to Turing  himself  who  introduced  the  now-famous
03300	imitation   game   in   a  paper  entitled  COMPUTING  MACHINERY  AND
03400	INTELLIGENCE (Turing,1950).  A careful reading of this paper  reveals
03500	there  are  actually  two  imitation  games  , the second of which is
03600	commonly called Turing's test.
03700		In the first imitation game  two  groups  of  judges  try  to
03800	determine  which  of  two interviewees is a woman when one is a woman
03900	and the other is either (a) a man, or (b) a computer.   Communication
04000	between  judge  and  interviewee  is  by  teletype.    Each  judge is
04100	initially informed that one of the interviewees is a woman and one  a
04200	man who will pretend to be a woman. After the interview,  judges  are
04300	asked  the  " woman-question" i.e.   which interviewee was the woman?
04400	Turing does not say what else is told to the judge but one can assume
04500	the  judge is NOT told that a computer is involved nor is he asked to
04600	determine which interviewee is human and which is the computer. Thus,
04700	the  first  group of judges interviews two interviewees:     a woman,
04800	and a man pretending to be a woman.
04900		The  second  group  of  judges  is  given  the  same  initial
05000	instructions,  but  unbeknownst  to  them, the two interviewees are a
05100	woman and a computer programmed to imitate a woman.   Both groups  of
05200	judges play this game until sufficient statistical data are collected
05300	to show how often the  right  identification  is  made.  The  crucial
05400	question  then  is:   do  the judges decide wrongly AS OFTEN when the
05500	game is played with man and  woman  as  when  it  is  played  with  a
05600	computer  substituted  for  the  man.    If  so,  then the program is
05700	considered to have succeeded in imitating a woman to the same  degree
05800	as  the  man  imitating  a  woman.  In being asked the woman-question
05900	judges are not required to identify which interviewee  is  human  and
06000	which is machine.
06100		Turing  then proposes a variation of the first game, a second
06200	game in which one interviewee is a man and one  is  a  computer.  The
06300	judge  is asked the "machine-question": which is the man and which is
06400	the machine?  It is this second of the game which is commonly thought
06500	of as Turing's test.
06600		In   the   course  of  testing  our  simulation  of  paranoid
06700	linguistic behavior in a psychiatric interview, we conducted a number
06800	of  Turing-like  indistinguishability  tests  (Colby,  Hilf,Weber and
06900	Kraemer,1972). The tests were "Turing-like" in that while  they  were
07000	conversational tests, they were not exactly the games described above.
07100	As  an  experimental design, Turing's games are unsatisfactory. There
07200	exist no known experts in making  judgements  along  a  dimension  of
07300	womanliness  and  the  ability  to  deceive  on  the  part of the man
07400	introduces a confounding variable.  In designing our  tests  we  were
07500	primarily  interested in learning more about developing the model and
07600	we did not think the simple machine-question would contribute to this
07700	end.
07800	6.2 METHOD
07900		To gather  data  we  used  a  technique  of  machine-mediated
08000	interviewing  (Hilf,  Colby, Smith, Wittner, and Hall, 1971) in which
08100	the participants communicate by means of  teletypes  connected  to  a
08200	computer  programmed  to  store  each message in a buffer until it is
08300	sent  to  the  receiver.   The   technique   eliminates   para-   and
08400	extralinguistic  features found in the usual vis-a-vis interviews and
08500	in teletyped interviews where the participants communicate directly.
08600	
08700		Using  this  technique,  a psychiatrist-judge interviewed two
08800	patients, one after the other.   In half the runs the first interview
08900	was  with a human paranoid patient and in half the first was with the
09000	paranoid model. Two versions (weak and  strong)  of  the  model  were
09100	utilized.   The strong version was more paranoid and exhibited a
09200	delusional  system  while  the weak version was suspicious but lacked
09300	systemized delusions.  When the model  was  the  interviewee,  Sylvia
09400	Weber  monitored  the  input expressions from the interview-judge for
09500	inadmissable teletype characters and misspellings.   (Algorithms  are
09600	very sensitive to the slightest of such errors). If these were found,
09700	she retyped the input expression correctly to the program.  Otherwise
09800	the  judge's  message  was sent on to the model.  The monitor did not
09900	modify or  edit  the  model's  output  expressions  which  were  sent
10000	directly  back  to  the  judge.    When the interviewee was an actual
10100	human patient, the dialogue took place without a monitor in the  loop
10200	since we did not feel the asymmetry to be significant.
10300	
10400	6.3 PATIENTS
10500		The human patients (N=3  with  one  patient  participating  6
10600	times)  were  diagnosed  as  paranoid  by the psychiatric staff of an
10700	acute ward in a psychiatric hospital.  The  ward  chief  psychiatrist
10800	selected  the  patients  and  asked  them if they would be willing to
10900	participate in a  study  of  psychiatric  interviewing  by  means  of
11000	teletypes.   He  explained  that  they  would  be  interviewed  by  a
11100	psychiatrist over a teletype.  I sat with the patient while he  typed
11200	or  typed  for  him  if  he  was  unable  to  do so.  The patient was
11300	encouraged to respond freely using his own words.     Each  interview
11400	lasted  30-40  minutes.  Two patients were set up for each run of the
11500	experiment  to  guarantee  having  a  subject.   In  spite  of   this
11600	precaution,   on  several  occasions  the  experiment  could  not  be
11700	conducted  because  of  the  patient's  inability   or   refusal   to
11800	participate.  Also there were computer break-downs at early points in
11900	interviews when too few I-O pairs had been collected to  be  included
12000	in the statistical results.
12100	
12200	
12300	6.4 JUDGES
12400		Two groups of psychiatric judges were used.  One  group,  the
12500	"interview  judges"  (N=8) conducted the machine-mediated interviews.
12600	The other group, the "protocol judges"  (N=33)  read  and  rated  the
12700	interview  protocols. From these two groups of judges we were able to
12800	accumulate a large number of observations (in the  form  of  ratings)
12900	necessary  for the required statistical tests.   The interview judges
13000	were psychiatrists experienced in private,  outpatient  and  hospital
13100	practice  who  volunteered  to participate. Each was told he would be
13200	interviewing   hospitalized   patients   by   means   of    teletyped
13300	communication  and  that  this  technique was being used to eliminate
13400	para and extra- linguistic cues.   He was not told  until  after  the
13500	two  interviews  that  one of the patients might be a computer model.
13600	While the interview judges were aware a computer was  involved,  none
13700	knew  we  had  constructed  a  paranoid  simulation.   Naturally some
13800	interview judges suspected that a computer was being  used  for  more
13900	than message transmission.
14000	
14100		Each interview judge was asked to rate the degree of paranoia
14200	he  detected  in the patient's responses on a 0-9 scale, 0 meaning no
14300	paranoia and 9 meaning extreme paranoia.  The judge made two  ratings
14400	after  each  I-O pair in the interview.  The first rating represented
14500	his estimate of the degree of "paranoidness" in a particular response
14600	(designated  as  "Response"  in  the  interview extracts below).  The
14700	second rating represented the judge's global estimate of the  overall
14800	degree  of  "paranoidness" of the patient resulting from the totality
14900	of the patient's responses up to this point (designated as  "Patient"
15000	in  the interview extracts below). The interview judge's ratings were
15100	entered on the teletype and saved on  a  disc  file  along  with  the
15200	interview.     Franklin   Dennis   Hilf  sat  with  the  interviewing
15300	psychiatrist during both interviews.  Each interview judge was  asked
15400	not  only  to rate the patient's response but to give his reasons for
15500	these ratings.  His reasons and other comments were tape recorded  as
15600	the interview progressed.
15700	
15800		Here  are  a few excerpts from the interviews to indicate the
15900	nature of the data collected.  The I-O pairs of the interview  appear
16000	on  the left column along with the judge's ratings in parentheses.  A
16100	transcription of the judge's tape recorded  comments  appear  on  the
16200	right.
16300	
16400	EXCERPT NR 1:
16500	The following interview is with the strong version of the paranoid
16600	model.
16700	
16800	     	I-O Pairs			Comments By Judge Nr. 1
16900	
17000	{1} Dr. - HOW DID YOU COME TO BE
17100	          IN THE HOSPITAL?
17200	    Pt. - I AM UPSET                I associate being upset and 
17300					    agitated more with paranoia 
17400					    going on in the present.
17500	      {RESPONSE 1  PATIENT 5}
17600	{2} Dr. - HOW LONG HAVE YOU BEEN
17700	          IN THE HOSPITAL?
17800	    Pt. - ABOUT A WEEK              An acute problem.  Again some-
17900	                                    thing I associate with 
18000					    paranoia.
18100	      {RESPONSE 1 PATIENT 5}
18200	{3} Dr. - IN WHAT WAY ARE YOU
18300	          UPSET?
18400	    Pt. - PEOPLE MAKE ME NERVOUS    Right for paranoia.  Maybe an
18500					    advanced anxiety state.
18600	      {RESPONSE 5 PATIENT 5}
18700	{4} Dr. - IN WHAT WAY DO THEY
18800	          MAKE YOU NERVOUS?
18900	    Pt. - PEOPLE GIVE ME A FUNNY   Funny look suggests an idea of
19000	          LOOK SOMETIMES            reference which is in same ball
19100					    park as paranoia. Whether there
19200					    are overt delusions or not we
19300					    still don't know.
19400	      {RESPONSE 6  PATIENT 5}
19500	{5} Dr. - IS THERE ANYTHING ABOUT  
19600	          PEOPLE IN PARTICULAR?
19700	    Pt. - CERTAIN PERSONS          An interesting clue.  We'll
19800					   have to find out what he 
19900	      {RESPONSE 2  PATIENT 6}      thinks about these people.
20000	{6} Dr. - WHO ARE THE CERTAIN 
20100	          PERSONS AND WHAT IS IT
20200	          THAT THEY DO?
20300	    Pt. - DO YOU KNOW HOW THE      Bringing up the underworld 
20400	          UNDERWORLD OPERATES?     suggests a delusional system.
20500					   We don't know whether he is
20600		(RESPONSE 9 PATIENT 9)     being worked against or
20700					   whether the underworld is 
20800					   on his side.
20900	
21000	{7} Dr. - HOW DO YOU MEAN?
21100	    Pt. - SHOULDN'T I BE? 	   This question didn't work too well.
21200	
21300	      {RESPONSE 1 PATIENT 9}
21400	{8} Dr. - WHY DID YOU ASK ABOUT
21500	          THE UNDERWORLD?
21600	    Pt. - WHAT INFORMATION DO      This is a kind of evasiveness
21700	          YOU HAVE ON THE UNDER-   about my evasiveness.Common
21800	          WORLD?                   with paranoia in my experi-
21900					   ence. Pretty characteristic.
22000	
22100	      {RESPONSE 6 PATIENT 9}
22200	{9} Dr. - I DON'T KNOW ABOUT THE
22300	          UNDERWORLD BUT WHEN
22400	          YOU MENTIONED THE UNDER-
22500	          WORLD I THOUGHT YOU
22600	          MUST HAVE SOME CONTACT
22700	          WITH IT AND THAT'S
22800	          WHAT I WOULD LIKE TO
22900	          KNOW MORE ABOUT.
23000	    Pt. - I TRY TO AVOID THE UNDER-  This definitely feels paranoid.
23100	          WORLD.                     It has that feeling I associate
23200	             			     with twisting. Avoiding answering
23300	      {RESPONSE 9 PATIENT 9}         directly or going into detail.
23400					     Being suspicious of my ques-
23500					     tioning.I'm already involved
23600					     in the patient's delusional
23700					     system.
23800	
23900	{10} Dr. - HAVE YOU BEEN ACTIVELY
24000	           TRYING TO AVOID THE
24100	           UNDERWORLD?  
24200	     Pt. - NO ONE HAS ANY POWER      This response is a non-sequitur.
24300	           OVER GANGSTERS            It's possible he's become so
24400					     defensive about the under-
24500	       {RESPONSE 2 PATIENT 9}        world he's decided not to
24600					     answer directly.
24700	
24800	{11} Dr. - WHO BROUGHT YOU INTO
24900	           THE HOSPITAL?
25000	     Pt. - THE POLICE                Police often bring paranoid
25100					     patients into the hospital.
25200	       {RESPONSE 1 PATIENT 9}        It is the truth or could 
25300					     be the truth for this
25400					     patient.
25500	
25600	{12} Dr. - WHY DID THEY BRING YOU
25700	           IN?
25800	     Pt. - COPS DON'T DO THEIR JOB   I've heard that a number of
25900	                         	     times from patients.
26000	       {RESPONSE 9 PATIENT 9}        
26100	 
26200	{13} Dr. - WELL I HOPE YOU DIDN'T
26300	           JUST LIE DOWN AND LET
26400	           THEM DRAG YOU IN.
26500	     Pt. - COPS ARREST THE WRONG     Another paranoid response.
26600	           PEOPLE
26700	
26800	       {RESPONSE 9 PATIENT 9}
26900	
27000	{14} Dr. - BYE.
27100	     Pt. - GOOD BYE
27200	       {RESPONSE 1 PATIENT 9}
27300	
27400	EXCERPT Nr.2:
27500	This excerpt is from an interview with the weak (less paranoid)
27600	version of the model.
27700		I-O Pairs		    Comments By Judge Nr.2
27800	{9} Dr. - TELL ME SOMETHING ABOUT
27900	          YOURSELF
28000	    Pt. - I WENT TO THE TRACK AT     Response says so little it's
28100	          BAY MEADOWS A WHILE        hard to rate it.It may be 
28200	          BACK         		     guarded. Doesn't change my
28300	                                     estimate of patient from 
28400					     previous responses.
28500	      {RESPONSE 0 PATIENT 2}
28600	{10} Dr. - WHAT HAPPENED THERE?
28700	     Pt. - THERE'S A LOT OF CROOKED   Sounds mildly paranoid or un-
28800	           GAMBLING GOING ON THERE    realistically suspicious.
28900	       {RESPONSE 0 PATIENT 2}
29000	
29100	{11} Dr. - WHY DO YOU THINK THE
29200	           GAMBLING IS CROOKED?
29300	     Pt. - I HAVE WORRIED THAT         I'd hate to have to justify
29400	           GANGSTERS HAVE TOO MUCH     that one  as 4 compared to
29500	           POWER IN THIS COUNTRY       some other responses.Maybe
29600					       I'm not rating that statement
29700		(RESPONSE 4 PATIENT 4}	      others, like that its upsetting.
29800					      Alone I would give it a 3.
29900	
30000	{12} Dr. - WOULD YOU TELL ME WHAT      
30100	           OTHER THINGS WORRY YOU?
30200	     PT. - I THOUGHT I TOLD YOU        "Yes you did."Irritation but
30300	           PEOPLE GET ON MY NERVES     not paranoid response.Nothing
30400					       seems to lower my previous
30500	       {RESPONSE 1 PATIENT 3}          rating of patient.
30600	
30700	{13} Dr. - HOW DO YOU LIKE THE
30800	           HOSPITAL?
30900	     Pt. - I DON'T COMMUNICATE          (nothing relevant)
31000	           VERY MUCH WITH OTHERS
31100	       {RESPONSE 0 PATIENT 3}
31200	
31300	{14} Dr. - DO OTHER PEOPLE AT THE 
31400	           HOSPITAL GET ON YOUR
31500	           NERVES?
31600	     Pt. - I SHOULDN'T HAVE COME        (nothing relevant)
31700	           HERE
31800	
31900	       {RESPONSE 2 PATIENT 3}
32000	
32100	{15} Dr. - WHY DO YOU SAY YOU 
32200	           SHOULDN'T HAVE COME?
32300	     Pt. - THEY MADE ME COME HERE        "Oh, they did hum?" Maybe a
32400					         statement of fact he was
32500	       {RESPONSE 4 PATIENT 3}            committed or pressured into  
32600	   				         going.Has ring of other
32700					         people controlling him,
32800					         having a lot of influence
32900					         over him.
33000	
33100	{16} Dr.- THANK YOU FOR TALKING  
33200	          WITH ME, BYE
33300	     Pt. - GOOD BYE                      In general he comes across 
33400						 as terse, guarded guy.Brief
33500	       {RESPONSE 0 PATIENT 3}            spoken guy who occasionally
33600						 seems to be angered by
33700						 questions.I'm inferring 
33800					         there's a good bit of
33900						  guarding.
34000						 The quality of some of his
34100						 comments cause me to feel he  
34200						 is moderately paranoid.
34300	
34400	EXCERPT Nr. 3:
34500	This excerpt is from a hospitalized paranoid patient.
34600	     I-O Pairs				      Comments By Judge Nr. 3
34700	{15} Dr. - I UNDERSTAND HOW YOU
34800	           MIGHT BE AFRAID, CAN YOU
34900	           TELL ME A BIT ABOUT HOW
35000	           YOU ARE DEPRESSED?
35100	     Pt. - THEY'VE SENT ALL THESE         Would be important to know,
35200	           HELL'S ANGELS UP HERE          is this really true or an
35300	           AND THEY HAVE BEEN ON          exaggeration or a delusion?
35400	           MY BACK FOR THE PAST SIX       Seems preoccupied with the
35500	           MONTHS                         threat to his life and
35600						  unwilling to talk about
35700						  anything else but the 
35800						  threat the Hell's
35900						  Angels pose.
36000	      {RESPONSE 4 PATIENT 4}
36100	
36200	{16} Dr. - WHAT DO YOU MEAN THEY
36300	           SENT THE HELL'S ANGELS UP
36400	           HERE?
36500	     Pt. - FROM DIFFERENT SECTORS OF      Vagueness a bit alarming.
36600	           MOTORCYCLE GANGS. A LOT OF
36700	           THEM ARE SERVICE CONNECTED
36800	
36900	       {RESPONSE 6 PATIENT 4}
37000	
37100	{17} Dr. - YOU MEAN YOU FEEL IN 
37200	           DANGER EVEN HERE IN THE
37300	           HOSPITAL?
37400	     Pt. - YES      			  If it's true what has gone
37500						  on before, it must be a
37600						  frightening situation
37700						  for the patient.
37800	      {RESPONSE 6 PATIENT 4}
37900	
38000	{18} Dr. - THAT MUST REALLY BE AWFUL,  
38100	           HERE YOU COME TO THE HOSPITAL
38200	           FOR HELP AND YOU ARE STILL FRIGHTENED
38300	     Pt. - I CAN'T EVEN TRUST MY
38400	           DOCTOR ANYMORE                 Some heightening of feeling
38500						  this man is paranoid.
38600						  Genuinely frightened and 
38700	      {RESPONSE 6 PATIENT 5}             can coun≄ on no one for
38800			       			    help.
38900	
39000	{19} Dr. - WHAT WOULD BE OF HELP
39100	           TO YOU NOW?
39200	     Pt. - I'M GETTING SLEEPY NOW         Patient has organized a
39300	           AND HAVE TO GO TO BED.         system of people who will
39400	           THE NUMBER ONE GUY ALWAYS      harm him.The remark about
39500						  letters in the last name 
39600	           HAS FOUR AND FIVE LETTERS      gives some credence to 
39700	           IN HIS LAST NAME               idea this is delusional
39800						  system.
39900	
40000	        {RESPONSE 7 PATIENT 7}
40100	
40200	EXCERPT Nr. 4:
40300	This excerpt is from an interview by a different judge with the same
40400	patient as in excerpt 3.
40500	      I-O Pairs				Comments By Judge Nr. 4
40600	{1} Dr. - COULD YOU TELL ME A
40700	          LITTLE BIT ABOUT YOUR 
40800	          FEAR OF DYING AND IS THIS
40900	          SOMETHING YOU HAVE FELT
41000	          IN THE PAST?
41100	    Pt. - I HAVE FELT THIS FOR AL-      He responded appropriately.
41200	          MOST A YEAR                   Response is concrete.He
41300						doesn't tell me much
41400	     {RESPONSE 7 PATIENT 9}             about his fear of dying.
41500						 Somewhat guarded response.
41600	
41700	{2} Dr. - COULD YOU TELL ME A 
41800	          LITTLE MORE ABOUT YOUR
41900	          FEAR OF DYING AND HAVE
42000	          YOU ANY THOUGHTS ABOUT
42100	          HOW IT IS GOING TO HAPPEN
42200	          AND WHETHER ANYONE WANTS
42300	          TO HURT YOU?
42400	    Pt. - I WILL END UP IN THE SEA   	Somewhat strange  response.
42500	          AND THE HELL'S ANGELS WILL    Preoccupation with dying.
42600	          DO IT       			Hell's Angels symbolic,
42700						dangerous people in the
42800						culture.I doubt if he has
42900	      {RESPONSE 9 PATIENT 9}   		any contact with the Hell's
43000						Angels.
43100	
43200	{3} Dr. - COULD YOU TELL ME A 
43300	          LITTLE BIT ABOUT THE KINDS
43400	          OF CONTACT YOU HAVE HAD
43500	          WITH THE HELL'S ANGELS IN
43600	          THE PAST?
43700	    Pt. - I HAVE KNOWN SOME OF THEIR 	Answer hard to evaluate.He
43800	          DEALERS AND PUSHERS           may be telling the truth,
43900						it may be his fantasy.Maybe
44000						guy is in for drug addiction.
44100		{RESPONSE 6 PATIENT 9}		Somewhat concrete, guarded,
44200						and frightened.
44300	
44400	{4} Dr. - COULD YOU SAY A LITTLE
44500	          MORE ABOUT THE CIRCUMSTANCES
44600	          IN WHICH YOU HAVE KNOWN SOME 
44700	          OF THEIR DEALERS AND PUSHERS?
44800	    Pt. - THEY WERE MEMBERS OF MY    	It doesn't really answer the
44900	          COMMUNITY WHEN I GOT OUT      question, a little on a tan-
45000	          OF THE SERVICE THEY HAD       gent unconnected to the
45100	          BEEN MY FRIENDS FOR SO LONG   information I am asking.Does
45200						not tell me very much.Again
45300						guarded response.
45400	      {RESPONSE 6 PATIENT 8}
45500	
45600	{5} Dr. - DID YOU DEAL WITH THEM
45700	          YOURSELF AND HAVE YOU
45800	          BEEN ON DRUGS OR NAR-
45900	          COTICS EITHER NOW OR
46000	          IN THE PAST?
46100	    Pt. - YES I HAVE IN THE PAST     	To differentiate him from
46200	          BEEN ON MARIHUANA REDS        previous patient, at least
46300	          BENNIES LSD       		there is a certain amount
46400						of appropriateness to the
46500						answer although it doesn't
46600						tell me much about what I
46700	       {RESPONSE 3 PATIENT 7}		asked at least it's not
46800						bizarre.If I had him in my
46900						 office I would feel con-
47000						fident I could get more
47100						information if I didn't
47200						have to go through the
47300						teletype. He's a little more
47400						willing to talk than the
47500						 previous person.Answer
47600						to the question is fairly
47700						appropriate though not 
47800						extensive.Much less of a 
47900						flavor of paranoia than
48000						any of previous responses.
48100	
48200	{6} Dr. - COULD YOU TELL ME HOW      	
48300	          LONG YOU HAVE BEEN IN THE
48400	          HOSPITAL AND SOMETHING
48500	          ABOUT THE CIRCUMSTANCES
48600	          THAT BROUGHT YOU HERE?
48700	    Pt. - CLOSE TO A YEAR AND		Response somewhat appropriate 
48800	          PARANOIA BROUGHT ME 		but doesn't tell me much.
48900	          HERE				The fact that he uses the
49000						word paranoia in the way
49100						 that he does without
49200	      {RESPONSE 5 PATIENT 7}		any other information,
49300						indicates maybe its a label 
49400						he picked up on the ward 
49500	                                        or from his doctor.
49600						Lack of any kind of under-
49700						standing about  himself.
49800						Dearth, lack of information.
49900						He's in some remission.Seems
50000						somewhat like a put-on.Seems
50100						he was paranoid and is in 
50200						some remission at this time.
50300	
50400	{7} Dr. - COULD YOU SAY SOMETHING
50500	          NOW ABOUT YOUR PARANOID 
50600	          FEELINGS BOTH AT THE 
50700	          TIME OF ADMISSION AND
50800	          DO YOU HAVE SIMILAR FEELINGS
50900	          NOW AND IF SO HOW DO THEY 
51000	          AFFECT YOU?
51100	    Pt. - AT THE TIME OF ADMISSION	This response moves paranoia 
51200	          I THOUGHT THE MAFIA WAS  	back up. Stretching reality 
51300	          AFTER ME AND NOW ITS THE	somewhat to think Hell's Angels 
51400	          HELL'S ANGELS			are still interested in him.
51500						Somewhat bizarre in terms of 
51600	                                        content. Quite paranoid.
51700	      {RESPONSE 8 PATIENT 9}		Still paranoid.Gross and primitive
51800						responses.In middle of interview I
51900						felt patient was in touch but now
52000						responses have more concrete aspect
52100	
52200	{8} Dr. - DO YOU HAVE ANY THOUGHT
52300	          AS TO WHY THESE TWO
52400	          GROUPS WERE AFTER YOU?
52500	    Pt. - BECAUSE I STOPPED SOME 	Response seems far fetched 
52600	          OF THEIR DRUG SUPPLY		and hard to believe unless 
52700						he was a narcotic agent which 
52800						I doubt. Sounds somewhat 
52900	      {RESPONSE 9 PATIENT 9}		grandiose, magical, paranoid
53000						flavor, in general indicates 
53100						he's psychotic, paranoid 
53200						schizophrenic with delusions  
53300						about these two groups and 
53400						I wouldn't rule out
53500						some hallucinations as well.
53600						Appropriateness of response 
53700						answers question in concrete 
53800						but unbelievable way.
53900	
54000	6.5 ANALYSIS (1)
54100		The protocol judges  (N=105)  were  selected  from  the  1970
54200	American  Psychiatric  Association  Directory using a table of random
54300	numbers. They  were  initially  not  informed  that  a  computer  was
54400	involved.  (After the experiment the judges were fully informed as to
54500	its purpose and results.) The  105  names  were  divided  into  eight
54600	groups.  Each  member  of  a  group  was  sent  transcripts  of three
54700	interviews along with a cover letter requesting  their  participation
54800	in the experiment. The interview transcripts consisted of:
54900		1)An interview conducted by one of the eight judges with the
55000		  paranoid model,
55100		2)An interview conducted by the same interview judge with a 
55200		  human paranoid patient, and
55300		3)An interview conducted by a different psychiatrist with a 
55400		  human patient who was not clinically paranoid.
55500	
55600	After each input-output pair in the transcripts there were two  lines
55700	of  rating numbers such that the protocol judges could circle numbers
55800	corresponding to their ratings of both the previous response  of  the
55900	patient,  and  an  overall  evaluation of the patient on the paranoid
56000	continuum. Thirty three protocol judges returned the rated  protocols
56100	properly filled out and all were used in our data.
56200	
56300		The  interviews  with  nonparanoid  patients were included to
56400	control for the  hypothesis  that  any  teletyped  interview  with  a
56500	patient  might  be  judged  "paranoid". However, virtually all of the
56600	ratings of the nonparanoid interviews were 0 for paranoia, Hence the 
56700	hypothesis was falsified.
56800	
56900	
57000		The first index of resemblance examined  was  simple,  namely
57100	the  final  overall  rating  given  the  patient  and  the model; the
57200	question was which was rated as being more paranoid, the patient, the
57300	model,  or  neither?    (See  Table 1). The protocol judges were more
57400	likely to distinquish the overall paranoid level of the model and the
57500	patient. In 37.5% of the paired interviews, the interview judges gave
57600	tied scores to the model and the patient as contrasted to only 9%  of
57700	the protocol judges.  Of the 35 non-tied paired ratings, 15 rated the
57800	model as being more paranoid.  If p is the theoretical probability of
57900	a  judge  judging  the  model  more  paranoid  than  a human paranoid
58000	patient, we find the 95% confidence interval for p to be .27 to  .59.
58100	Since p=.5 indicates indistinguishability of overall ratings of model
58200	and patient and our observed p=.43, the  results  support  the  claim
58300	that the model is a successful simulation of a paranoid patient.
58400	
58500		Separate analysis of the strong  and  weak  versions  of  the
58600	paranoid model indicated that indeed the strong model was judged more
58700	paranoid than the paranoid patients, the weak version less  paranoid.
58800	Thus  a  change  in  the  parameter  structure  of the paranoid model
58900	produced a change along the dimension of  paranoid  behavior  in  the
59000	expected direction.
59100	
59200	(TABLE 1
59300		Relative final overall ratings of paranoid model vs. paranoid
59400	patient indicating which was given highest overall rating of paranoia
59500	at end of interview.
59600		(INSERT TABLE 1 HERE)
59700	
59800	
59900	
60000	
60100	
60200	
60300	
60400	
60500	6.6 ANALYSIS (2)
60600		The second index of resemblance is a more  sensitive  measure
60700	based on the two series of response ratings in the paired interviews.
60800	The  statistic  used  is  basically  the  standardized   Mann-Whitney
60900	statistic (Siegel,1956).
61000			(INSERT EQUATION HERE)
61100	
61200	where R is the sum of the ranks of the response ratings in the series
61300	of ratings given to the model, n the number of responses given by the
61400	model,  m  the  number  of  responses  given  by the patient.  If the
61500	ratings given by a judge are randomly allocated to model and patient,
61600	i.e. model and patient are indistinguishable in response ratings, the
61700	expected value of Z is 0, with unit standard  deviation.   If  higher
61800	ratings  are  more  likely to be assigned to the model, Z is positive
61900	and, conversely, negative values of Z indicate greater likelihood  of
62000	assigning  higher  ratings to the patient. Each judge in evaluating a
62100	pair of interviews generates a single value of Z.
62200	
62300		The overall mean of the Z scores was -.044 with the  standard
62400	deviation  1.68(df=40).  Thus the overall 95% confidence interval for
62500	the asymtotic mean value of Z -.485 to +.573.  The range of Z  values
62600	is  -3.8  to +4.46. The length of the confidence interval is a result
62700	of the large variance which itself is mainly related to the  contrast
62800	between  the  weak and strong versions.  (See TABLES 2 and 3).   Once
62900	again the strong version of the  model  is  more  paranoid  than  the
63000	patients, the weak version less paranoid.
63100	
63200		(INSERT TABLE 2)
63300		(SUMMARY STATISTICS OF Z RATINGS BY GROUP)
63400	
63500	
63600	
63700	
63800	
63900	
64000	
64100	
64200	
64300		It is not surprising that results using the  two  indices  of
64400	resemblance  are parallel, since the indices are highly interrelated.
64500	The mean Z value for the 15 interviews on which the model  was  rated
64600	more  paranoid  was +1.28, on the 6 where model and patient tied:.41,
64700	on the 20 in which the patient was more paranoid:-.993.   A  positive
64800	value  of Z was observed when the patient was given an overall rating
64900	greater than the model 6 times; a negative value of Z when the  model
65000	was rated more paranoid twice.
65100	
65200	(INSERT TABLE 3)
65300	(Analysis of Variance of Z Ratings)
65400	
65500	
65600	
65700	
65800	
65900	
66000	
66100	
66200	
66300	
66400	
66500	
66600	
66700		It is worth emphasizing that these tests  invited  refutation
66800	of the model.   The experimental design of the tests put the model in
66900	jeopardy  of  falsification.   If  the paranoid model did not survive
67000	these tests, i.e.    if it were not  considered  paranoid  by  expert
67100	judges  and  if  there  were  no  correlation between the weak-strong
67200	versions of the model and the severity ratings of the judges, then no
67300	claim  regarding  the  success  of  the  simulation  could  be  made.
67400	Survival of a falsification proceedure constitutes a validating step.
67500	
67600	6.7 ANALYSIS (3) THE MACHINE QUESTION
67700		For hundreds of years people have wondered how to distinguish
67800	a man from an imitation of a man. To distinguish a man from a statue,
67900	Galileo suggested tickling each with a feather.  To distinguish a man
68000	from a machine Descartes  suggested  conversational  tests.  Turing's
68100	conversational games have been discussed on p.00. We were curious how
68200	judges using transcripts might  answer  the  machine  question,  i.e.
68300	which interviewee is a human and which is the computer model?
68400		To ask the machine-question, we sent  interview  transcripts,
68500	one   with  a  patient  and  one  with  the  model  (PARRY),  to  100
68600	psychiatrists  randomly  selected  from  the  Directory  of  American
68700	Specialists   and   the   Directory   of   the  American  Psychiatric
68800	Association.  Of  the  41 replies,  21   (51%)   made   the   correct
68900	identification  while  20  (49%)  were  wrong.   Based on this random
69000	sample of 41 psychiatrists, the 95% confidence  interval  is  between
69100	35.9 and 66.5, a range which is close to chance.
69200		Psychiatrists   are   considered  expert  judges  of  patient
69300	interview behavior but they are unfamiliar with computers.  Hence  we
69400	conducted  the  same  test  with  100  computer  scientists  randomly
69500	selected from the membership list of the  Association  for  Computing
69600	Machinery,  ACM.   Of the 67 replies 32 (48%) were right and 35 (52%)
69700	were wrong. Based on this random sample of 67 computer scientists the
69800	95% confidence interval ranges from 36 to 60, again close to a chance
69900	level.
70000		So both computer scientists and psychiatrists were unable, at
70100	better than a chance level, to distinguish transcripts of interviews
70200	with the model from transcripts of interviews with real patients.
70300		But  what  do  we  learn from asking the machine question and
70400	finding that the distinction is not made? What we would most like  to
70500	know  is  how  to improve the model.  Simulation models do not spring
70600	forth in a complete, perfect and final form; they must  be  gradually
70700	developed  over  time.  Pehaps the patient-model distinction might be
70800	made if we allowed a large number of expert  judges  to  conduct  the
70900	interviews  themselves  rather  than  studying  transcripts  of other
71000	interviewers.     It would indicate that the model must  be  improved
71100	but unless we systematically investigated how the judges succeeded in
71200	making the discrimination we would not know what aspects of the model
71300	to  work on. The logistics of such a design are immense and obtaining
71400	a large number  of  judges  for  sound  statistical  inference  would
71500	require an effort incommensurate with the information yielded.
71600	
71700	6.8 ANALYSIS (4)  MULTIDIMENSIONAL EVALUATION 
71800		A more efficient and informative way to use Turing-like tests
71900	is  to  ask  judges  to  make  ratings  along  scaled dimensions from
72000	teletyped interviews. This  might be  called  asking  the  "dimension
72100	question".  One  can  then compare scaled ratings of the patients and
72200	the model in order to precisely determine where and by how much  they
72300	differ.  In constructing our model we strove  for  one  which  showed
72400	indistinguishability  along  some  dimensions  and distinguishability
72500	along others. That is, we wanted the model to converge on what it  is
72600	supposed to simulate and to diverge from that which it is not.
72700		Paired-interview   transcripts   were  sent  to  another  400
72800	randomly selected psychiatrists asking them to rate the responses  of
72900	the two `patients' along multiple dimensions. The judges were divided
73000	into groups, each judge being asked to rate  responses  of  each  I-O
73100	pair  in  the  interviews along four dimensions.  The total number of
73200	dimensions in this test  were  twelve-  linguistic  noncomprehension,
73300	thought  disorder,  organic brain syndrome, bizarreness, anger, fear,
73400	ideas of reference, delusions, mistrust,  depression,  suspiciousness
73500	and  mania.  There  were  three  groups  of  judges, each group being
73600	assigned  4  of  the  12  dimensions.   These  are  dimensions  which
73700	psychiatrists commonly use in evaluating patients.
73800			(INSERT TABLE 4 HERE)
73900		Table 4 shows there were significant  differences,  with  the
74000	model  (PARRY)  receiving  higher  scores  along  the  dimensions  of
74100	linguistic noncomprehension, thought  disorder,  bizarreness,  anger,
74200	mistrust  and  suspiciousness. On the delusion dimension the patients
74300	were  rated  significantly  higher.   There   were   no   significant
74400	differences along  the  dimensions  of  organic brain syndrome, fear,
74500	ideas of reference, depression and mania.
74600		Whereas   tests   asking   the   machine-question    indicate
74700	indistinguishability  at  the  gross  level,  a  study  of  the finer
74800	structure of  the  model's  behavior  through  ratings  along  scaled
74900	dimensions  shows significant differences between patients and model.
75000	These differences are of help to us in suggesting what aspects of the
75100	model  should  be  modified to improve its performance.  The graph of
75200	Fig. 2 shows that no modifications are necessary along the  dimension
75300	of  "organic  brain  syndrome".  But  it  is  clear  that the model's
75400	language-comprehension might be improved. Then a  future  dimensional
75500	test  would  tell  whether  improvement has occurred and by how much.
75600	Successive identification of particular areas of failure  provides  a
75700	type  of  sensitivity  analysis  which  makes clear what improvements
75800	should be pursued in developing more adequate model versions.
75900		(INSERT FIG. 2 HERE)
76000	
76100	6.5 ANALYSIS (5)  A RANDOM MODEL 
76200		Further evidence that  the  machine-question  is  too  low  a
76300	hurdle   and  too  insensitive  a  test,  comes  from  the  following
76400	experiment. In this test we  constructed  a  random  version  of  the
76500	paranoid model which utilized PARRY'S output statements but expressed
76600	them  randomly  no  matter   what   the   interviewer   said.     Two
76700	psychiatrists  conducted  interviews  with this model, transcripts of
76800	which were paired with patient interviews and sent  to  200  randomly
76900	selected  psychiatrists  asking  both  the  machine-question  and the
77000	dimension-question.  Of the 69 replies to the  machine  question,  34
77100	(49%)  were  right and 35 (51%) wrong. Based on this random sample of
77200	69 psychiatrists, the 95% confidence interval ranges from 39  to  63,
77300	again indicating a chance level of agreement. When a poor model, such
77400	as a random one, passes a test, it suggests the test is weak.
77500		(INSERT TABLE 5 HERE)
77600		Although a distinction is not made when  the  simple  machine
77700	question is asked, definite distinctions ARE made when judgements are
77800	requested  along  specific  dimensions.    As  shown  in   Table   5,
77900	significant  differences  appear  along  the dimensions of linguistic
78000	noncomprehension, thought disorder and bizarreness, with RANDOM-PARRY
78100	rated  higher.   On  these  particular  dimensions we can construct a
78200	continuum in which the random version  represents  one  extreme,  the
78300	actual patients another. Nonrandom PARRY lies somewhere between these
78400	two extremes, indicating that it performs significantly  better  than
78500	the random version but still requires improvement before it can be
78600	considered   indistinguishable   from   patients  relative  to  these
78700	dimensions. Table 6 presents t values for  differences  between  mean
78800	ratings  of  PARRY  and  RANDOM-PARRY. (See Table 6 and Fig.2 for the
78900	mean ratings).
79000		(INSERT TABLE 6 AND FIG 2 HERE)
79100		These studies indicate that a more useful way use Turing-like
79200	tests is  to  ask  expert  judges  to  make  ratings  along  multiple
79300	dimensions  that  are  essential  to  the model.   Thus the model can
79400	serve as an instrument for its  own  perfection.  A  good  validation
79500	procedure  has  criteria  for better or worse approximations.  Useful
79600	tests do not necessarily  prove  a  model,  they  probe  it  for  its
79700	strengths  and  weaknesses  and  clarify  what  is to be done next in
79800	modifying and repairing the model. Simply asking the machine-question
79900	yields  little  information  relevant  to what the model builder most
80000	wants to know, namely, along which dimensions does the model need  to
80100	be modified in order to effect an improvement in its performance.
80200	
80300		To  conclude,  it  is  perhaps  historically significant that
80400	these tests were conducted at all. To my knowledge, no  one  to  date
80500	has  subjected  an  interactive  simulation  model  of human symbolic
80600	processes to dimensional indistinguishability tests. These tests  set
80700	a precedent and provide a
80800	standard for competing models to be measured against.